Search CORE

278 research outputs found

Individual Privacy vs Population Privacy: Learning to Attack Anonymization

Author: Cormode Graham
Publication venue
Publication date: 10/11/2010
Field of study

Over the last decade there have been great strides made in developing techniques to compute functions privately. In particular, Differential Privacy gives strong promises about conclusions that can be drawn about an individual. In contrast, various syntactic methods for providing privacy (criteria such as kanonymity and l-diversity) have been criticized for still allowing private information of an individual to be inferred. In this report, we consider the ability of an attacker to use data meeting privacy definitions to build an accurate classifier. We demonstrate that even under Differential Privacy, such classifiers can be used to accurately infer "private" attributes in realistic data. We compare this to similar approaches for inferencebased attacks on other forms of anonymized data. We place these attacks on the same scale, and observe that the accuracy of inference of private attributes for Differentially Private data and l-diverse data can be quite similar

arXiv.org e-Print Archive

CiteSeerX

Tight Lower Bound for Comparison-Based Quantile Summaries

Author: Cormode Graham
Veselý Pavel
Publication venue
Publication date: 16/01/2020
Field of study

Quantiles, such as the median or percentiles, provide concise and useful information about the distribution of a collection of items, drawn from a totally ordered universe. We study data structures, called quantile summaries, which keep track of all quantiles, up to an error of at most

\varepsilon

. That is, an

\varepsilon

-approximate quantile summary first processes a stream of items and then, given any quantile query

0\le \phi\le 1

, returns an item from the stream, which is a

\phi'

-quantile for some

\phi' = \phi \pm \varepsilon

. We focus on comparison-based quantile summaries that can only compare two items and are otherwise completely oblivious of the universe. The best such deterministic quantile summary to date, due to Greenwald and Khanna (SIGMOD '01), stores at most

O(\frac{1}{\varepsilon}\cdot \log \varepsilon N)

items, where

N

is the number of items in the stream. We prove that this space bound is optimal by showing a matching lower bound. Our result thus rules out the possibility of constructing a deterministic comparison-based quantile summary in space

f(\varepsilon)\cdot o(\log N)

, for any function

f

that does not depend on

N

. As a corollary, we improve the lower bound for biased quantiles, which provide a stronger, relative-error guarantee of

(1\pm \varepsilon)\cdot \phi

, and for other related computational tasks.Comment: 20 pages, 2 figures, major revison of the construction (Sec. 3) and some other parts of the pape

arXiv.org e-Print Archive

Crossref

Warwick Research Archives Portal Repository

Iterative hessian sketch in input sparsity time

Author: Cormode Graham
Dickens Charlie
Publication venue
Publication date
Field of study

Warwick Research Archives Portal Repository

Engineering Streaming Algorithms

Author: Cormode Graham
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 16th International Symposium on Experimental Algorithms (SEA 2017)
Publication date: 01/01/2017
Field of study

Streaming algorithms must process a large quantity of small updates quickly to allow queries about the input to be answered from a small summary. Initial work on streaming algorithms laid out theoretical results, and subsequent efforts have involved engineering these for practical use. Informed by experiments, streaming algorithms have been widely implemented and used in practice. This talk will survey this line of work, and identify some lessons learned

Dagstuhl Research Online Publication Server

First Author Advantage: Citation Labeling in Research

Author: Cormode Graham
Muthukrishnan S.
Yan Jinyun
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

Citations among research papers, and the networks they form, are the primary object of study in scientometrics. The act of making a citation reflects the citer's knowledge of the related literature, and of the work being cited. We aim to gain insight into this process by studying citation keys: user-chosen labels to identify a cited work. Our main observation is that the first listed author is disproportionately represented in such labels, implying a strong mental bias towards the first author.Comment: Computational Scientometrics: Theory and Applications at The 22nd CIKM 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

Scienceography: the study of how science is written

Author: Cormode Graham
Muthukrishnan S.
Yan Jinyun
Publication venue
Publication date: 01/01/2012
Field of study

Scientific literature has itself been the subject of much scientific study, for a variety of reasons: understanding how results are communicated, how ideas spread, and assessing the influence of areas or individuals. However, most prior work has focused on extracting and analyzing citation and stylistic patterns. In this work, we introduce the notion of 'scienceography', which focuses on the writing of science. We provide a first large scale study using data derived from the arXiv e-print repository. Crucially, our data includes the "source code" of scientific papers-the LaTEX source-which enables us to study features not present in the "final product", such as the tools used and private comments between authors. Our study identifies broad patterns and trends in two example areas-computer science and mathematics-as well as highlighting key differences in the way that science is written in these fields. Finally, we outline future directions to extend the new topic of scienceography.Comment: 13 pages,16 figures. Sixth International Conference on FUN WITH ALGORITHMS, 201

arXiv.org e-Print Archive

CiteSeerX